In this workshop, we will explore the advanced features of the Nextflow language and runtime, and learn how to use them to write efficient and scalable data-intensive workflows. We will cover topics such as parallel execution, error handling, and workflow customization.
In this workshop, we will explore the advanced features of the Nextflow language and runtime, and learn how to use them to write efficient and scalable data-intensive workflows. We will cover topics such as parallel execution, error handling, and workflow customization.
Nextflow is based on the dataflow programming model in which processes communicate through channels.
In Nextflow, a process is the basic processing primitive to execute a user script. The process definition starts with the keyword process, followed by process name and finally the process body delimited by curly braces. The process body must contain a string which represents the command or, more generally, a script that is executed by it. A basic process looks like the following example:
In this chapter, we take a curated tour of the Nextflow operators. Commonly used and well understood operators are not covered here - only those that we’ve seen could use more attention or those where the usage could be more elaborate. These set of operators have been chosen to illustrate tangential concepts and Nextflow features.
The flatten operator transforms a channel in such a way that every tuple is flattened so that each entry is emitted as a sole element by the resulting channel.
Example:cat bin/example_flatten.nf
input_ch = Channel.from(["path/file1.fastq", "path/file2.fastq"], ["path/file3.fastq", "path/file4.fastq"], ["path/file5.fastq", "path/file6.fastq"])
input_ch
.flatten()
.view()
nextflow run bin/example_flatten.nf
## N E X T F L O W ~ version 23.04.1
## Launching `bin/example_flatten.nf` [trusting_ride] DSL2 - revision: d714f8acfd
## path/file1.fastq
## path/file2.fastq
## path/file3.fastq
## path/file4.fastq
## path/file5.fastq
## path/file6.fastq
The collect operator collects all of the items emitted by a channel in a list and returns the object as a sole emission.
Example:cat bin/example_collect.nf
workflow {
sample_ch = Channel.fromFilePairs("./data/reads/*_{1,2}.fastq.gz", checkIfExists:true)
.collect()
sample_ch.view()
}
nextflow run bin/example_collect.nf
## N E X T F L O W ~ version 23.04.1
## Launching `bin/example_collect.nf` [loving_baekeland] DSL2 - revision: 0eef816596
## ['SRR6357076', [/home/sinrasu/nextflow-workshop/data/reads/SRR6357076_1.fastq.gz, /home/sinrasu/nextflow-workshop/data/reads/SRR6357076_2.fastq.gz], 'SRR6357071', [/home/sinrasu/nextflow-workshop/data/reads/SRR6357071_1.fastq.gz, /home/sinrasu/nextflow-workshop/data/reads/SRR6357071_2.fastq.gz], 'SRR6357070', [/home/sinrasu/nextflow-workshop/data/reads/SRR6357070_1.fastq.gz, /home/sinrasu/nextflow-workshop/data/reads/SRR6357070_2.fastq.gz], 'SRR6357072', [/home/sinrasu/nextflow-workshop/data/reads/SRR6357072_1.fastq.gz, /home/sinrasu/nextflow-workshop/data/reads/SRR6357072_2.fastq.gz]]
The groupTuple operator collects tuples (or lists) of values emitted by the source channel, grouping the elements that share the same key. Finally, it emits a new tuple object for each distinct key collected.
Example:cat bin/example_groupTuple.nf
ch = channel
.of( ['wt','wt_1.fq'], ['wt','wt_2.fq'], ["mut",'mut_1.fq'], ['mut', 'mut_2.fq'] )
.groupTuple()
.view()
nextflow run bin/example_groupTuple.nf
## N E X T F L O W ~ version 23.04.1
## Launching `bin/example_groupTuple.nf` [determined_williams] DSL2 - revision: 14dfed2105
## [wt, [wt_1.fq, wt_2.fq]]
## [mut, [mut_1.fq, mut_2.fq]]
The branch operator allows you to forward the items emitted by a source channel to one or more output channels.
The selection criterion is defined by specifying a closure that provides one or more boolean expressions, each of which is identified by a unique label. For the first expression that evaluates to a true value, the item is bound to a named channel as the label identifier. For example:
cat bin/example_branch.nf
workflow {
params.input = "data/samplesheet.csv"
Channel.fromPath(params.input)
.splitCsv(header: true)
.map{ row -> [[sample:row.sample,strand:row.type],[row.fastq_1,row.fastq_2]]}
.branch{ meta, reads ->
single: meta.strand == "single"
paired: meta.strand == "paired"
}
.set { samples }
samples.single.view()
}
nextflow run bin/example_branch.nf
## N E X T F L O W ~ version 23.04.1
## Launching `bin/example_branch.nf` [amazing_maxwell] DSL2 - revision: 8b1f242980
## [[sample:WT_REP1, strand:single], [https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357070_1.fastq.gz, https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357070_2.fastq.gz]]
## [[sample:WT_REP1, strand:single], [https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357071_1.fastq.gz, https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357071_2.fastq.gz]]
## [[sample:WT_REP2, strand:single], [https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357072_1.fastq.gz, https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357072_2.fastq.gz]]
## [[sample:RAP1_IAA_30M_REP1, strand:single], [https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357076_1.fastq.gz, https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357076_2.fastq.gz]]
A common Nextflow pattern is for a simple samplesheet to be passed as primary input into a workflow. We’ll see some more complicated ways to manage these inputs later on in the workshop, but the splitCsv (docs) is an excellent tool to have in a pinch. This operator will parse a csv/tsv and return a channel where each item is a row in the csv/tsv:
cat bin/example_splitCsv.nf
workflow {
params.input = "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.10/samplesheet_test.csv"
Channel.fromPath(params.input)
.splitCsv(header: true)
.view()
}
Output:
nextflow run bin/example_splitCsv.nf
## N E X T F L O W ~ version 23.04.1
## Launching `bin/example_splitCsv.nf` [admiring_monod] DSL2 - revision: 5650611e54
## [sample:WT_REP1, fastq_1:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357070_1.fastq.gz, fastq_2:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357070_2.fastq.gz, strandedness:auto]
## [sample:WT_REP1, fastq_1:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357071_1.fastq.gz, fastq_2:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357071_2.fastq.gz, strandedness:auto]
## [sample:WT_REP2, fastq_1:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357072_1.fastq.gz, fastq_2:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357072_2.fastq.gz, strandedness:reverse]
## [sample:RAP1_UNINDUCED_REP1, fastq_1:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357073_1.fastq.gz, fastq_2:, strandedness:reverse]
## [sample:RAP1_UNINDUCED_REP2, fastq_1:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357074_1.fastq.gz, fastq_2:, strandedness:reverse]
## [sample:RAP1_UNINDUCED_REP2, fastq_1:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357075_1.fastq.gz, fastq_2:, strandedness:reverse]
## [sample:RAP1_IAA_30M_REP1, fastq_1:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357076_1.fastq.gz, fastq_2:https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357076_2.fastq.gz, strandedness:reverse]
The multiMap operator is a way of taking a single input channel and emitting into multiple channels for each input element.
Example:
cat bin/example_multiMap.nf
Channel.of( 1, 2, 3, 4, 5 )
.multiMap {
small: it
large: it * 10
}
.set { numbers }
numbers.small | view { num -> "Small: $num"}
numbers.large | view { num -> "Large: $num"}
nextflow run bin/example_multiMap.nf
## N E X T F L O W ~ version 23.04.1
## Launching `bin/example_multiMap.nf` [friendly_picasso] DSL2 - revision: 3d406c0ad2
## Small: 1
## Small: 2
## Small: 3
## Large: 10
## Small: 4
## Small: 5
## Large: 20
## Large: 30
## Large: 40
## Large: 50
The exercies below are designed to strengthen your knowledge in Nextflow more. The solution to each problem is blurred, only after attempting to solve the problem yourself should you look at the solution. Should you need any help, please ask one of the instructors.
1.Given sample sheet contain multiple entries for same sample and mixed of single and paired end, create separate channel for single and multiple entries.
sample
1 WT_REP1
2 WT_REP1
3 WT_REP2
4 RAP1_UNINDUCED_REP1
5 RAP1_UNINDUCED_REP2
6 RAP1_UNINDUCED_REP2
7 RAP1_IAA_30M_REP1
fastq_1
1 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357070_1.fastq.gz
2 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357071_1.fastq.gz
3 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357072_1.fastq.gz
4 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357073_1.fastq.gz
5 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357074_1.fastq.gz
6 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357075_1.fastq.gz
7 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357076_1.fastq.gz
fastq_2
1 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357070_2.fastq.gz
2 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357071_2.fastq.gz
3 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357072_2.fastq.gz
4
5
6
7 https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357076_2.fastq.gz
strandedness
1 auto
2 auto
3 reverse
4 reverse
5 reverse
6 reverse
7 reverse